Week 7
Milestones
-  Experiments with CLIP
- draw confusion matrices comparing the errors on CLIP and Tesseract classification - result: CLIP is better
 
 -  Set up a git repository - shrivastava95/clip-ocr for fine-tuning CLIP onto a given dataset.
- Created a dataset of cropped word images from some pages of en-or.pdf
 - Implemented base zero-shot approach
 - Implemented CoOp - https://arxiv.org/abs/2109.01134 as a cheaper alternative to finetuning CLIP
 
 
Screenshots / Videos
Contributions
Learnings
- Learnt about OpenAI's CLIP model, a zero-shot model for measuring semantic similarity between image and text pairs.
 - This is done using cosine similarity of their projections onto a common embedding space.